New KEGG pathway-based interpretable features for classifying ageing-related mouse proteins

نویسندگان

  • Fábio Fabris
  • Alex Alves Freitas
چکیده

MOTIVATION The incidence of ageing-related diseases has been constantly increasing in the last decades, raising the need for creating effective methods to analyze ageing-related protein data. These methods should have high predictive accuracy and be easily interpretable by ageing experts. To enable this, one needs interpretable classification models (supervised machine learning) and features with rich biological meaning. In this paper we propose two interpretable feature types based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and compare them with traditional feature types in hierarchical classification (a more challenging classification task regarding predictive performance) and binary classification (a classification task producing easier to interpret classification models). As far as we know, this work is the first to: (i) explore the potential of the KEGG pathway data in the hierarchical classification setting, (i) use the graph structure of KEGG pathways to create a feature type that quantifies the influence of a current protein on another specific protein within a KEGG pathway graph and (iii) propose a method for interpreting the classification models induced using KEGG features. RESULTS We performed tests measuring predictive accuracy considering hierarchical and binary class labels extracted from the Mouse Phenotype Ontology. One of the KEGG feature types leads to the highest predictive accuracy among five individual feature types across three hierarchical classification algorithms. Additionally, the combination of the two KEGG feature types proposed in this work results in one of the best predictive accuracies when using the binary class version of our datasets, at the same time enabling the extraction of knowledge from ageing-related data using quantitative influence information. AVAILABILITY AND IMPLEMENTATION The datasets created in this paper will be freely available after publication. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Statistical Approach for Recognizing and Classifying Patterns of Control Charts (RESEARCH NOTE)

Control chart pattern (CCP) recognition techniques are widely used to identify the potential process problems in modern industries. Recently, artificial neural network (ANN) –based techniques are very popular to recognize CCPs. However, finding the suitable architecture of an ANN-based CCP recognizer and its training process are time consuming and tedious. In addition, because of the black box ...

متن کامل

Protein Expression Profiles Characterize Distinct Features of Mouse Cerebral Cortices at Different Developmental Stages

The proper development of the mammalian cerebral cortex requires precise protein synthesis and accurate regulation of protein expression levels. To reveal signatures of protein expression in developing mouse cortices, we here generate proteomic profiles of cortices at embryonic and postnatal stages using tandem mass spectrometry (MS/MS). We found that protein expression profiles are mostly cons...

متن کامل

Comparative Analysis of Protein Expression Concomitant with DNA Methyltransferase 3A Depletion in a Melanoma Cell Line

DNA methyltransferase 3A (Dnmt3a), a de novo methyltransferase, has attracted a great deal of attention for its important role played in tumorigenesis. We have previously demonstrated that melanoma is unable to grow in-vivo in conditions of Dnmt3a depletion in a mouse model. In this study, we cultured the Dnmt3a depletion B16 melanoma (Dnmt3a-D) cell line to conduct a comparative analysis of pr...

متن کامل

Detecting Perturbed Subpathways towards Mouse Lung Regeneration Following H1N1 Influenza Infection

It has already been established by the systems-level approaches that the future of predictive disease biomarkers will not be sketched by plain lists of genes or proteins or other biological entities but rather integrated entities that consider all underlying component relationships. Towards this orientation, early pathway-based approaches coupled expression data with whole pathway interaction t...

متن کامل

iTRAQ-based proteomics profiling of Schwann cells before and after peripheral nerve injury

Objective(s): Schwann cells (SCs) have a wide range of applications as seed cells in the treatment of nerve injury during transplantation. However, there has been no report yet on kinds of proteomics changes that occur in Schwann cells before and after peripheral nerve injury.Materials and Methods: Activated Schwann cells (ASCs) and normal Schwann cells (NSCs) were obtained from adult Wistar ra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 32 19  شماره 

صفحات  -

تاریخ انتشار 2016